Goto

Collaborating Authors

 bayesian network


A Stable Distance Persistence Homology for Dynamic Bayesian Network Clustering

arXiv.org Machine Learning

Dynamic Bayesian networks (DBNs) are a widely used framework for modeling systems whose probabilistic structure evolves over time. Standard inference methods focus on local conditional distributions and can miss larger-scale patterns in how dependencies between variables organize and change over time. We introduce a topological approach to this problem. To each DBN we associate a time-varying graph, called a Dynamic Bayesian Graph (DBG), by assigning to each edge a strength that measures variation in its conditional dependence across parent configurations, and retaining edges whose strength exceeds a chosen threshold. We show that this construction fits within the dynamic graph framework of Kim and Mémoli, enabling the use of tools from topological data analysis. Applying persistent homology to a DBG produces a barcode, which records the merging and disappearance of connected groups of strongly dependent variables over time. We prove that this barcode is stable: small perturbations in the conditional probability tables of the DBN lead to small changes in the resulting barcode. This yields a principled and noise-resistant summary of how dependency structure evolves in a dynamic Bayesian network.


Reliable Causal Discovery with Improved Exact Search and Weaker Assumptions

Neural Information Processing Systems

Many of the causal discovery methods rely on the faithfulness assumption to guarantee asymptotic correctness. However, the assumption can be approximately violated in many ways, leading to sub-optimal solutions. Although there is a line of research in Bayesian network structure learning that focuses on weakening the assumption, such as exact search methods with well-defined score functions, they do not scale well to large graphs. In this work, we introduce several strategies to improve the scalability of exact score-based methods in the linear Gaussian setting. In particular, we develop a super-structure estimation method based on the support of inverse covariance matrix which requires assumptions that are strictly weaker than faithfulness, and apply it to restrict the search space of exact search. We also propose a local search strategy that performs exact search on the local clusters formed by each variable and its neighbors within two hops in the superstructure. Numerical experiments validate the efficacy of the proposed procedure, and demonstrate that it scales up to hundreds of nodes with a high accuracy.


Recursive Bayesian Networks: Generalising and Unifying Probabilistic Context-Free Grammars and Dynamic Bayesian Networks

Neural Information Processing Systems

Probabilistic context-free grammars (PCFGs) and dynamic Bayesian networks (DBNs) are widely used sequence models with complementary strengths and limitations. While PCFGs allow for nested hierarchical dependencies (tree structures), their latent variables (non-terminal symbols) have to be discrete. In contrast, DBNs allow for continuous latent variables, but the dependencies are strictly sequential (chain structure). Therefore, neither can be applied if the latent variables are assumed to be continuous and also to have a nested hierarchical dependency structure. In this paper, we present Recursive Bayesian Networks (RBNs), which generalise and unify PCFGs and DBNs, combining their strengths and containing both as special cases. RBNs define a joint distribution over tree-structured Bayesian networks with discrete or continuous latent variables. The main challenge lies in performing joint inference over the exponential number of possible structures and the continuous variables. We provide two solutions: 1) For arbitrary RBNs, we generalise inside and outside probabilities from PCFGs to the mixed discrete-continuous case, which allows for maximum posterior estimates of the continuous latent variables via gradient descent, while marginalising over network structures.


Graph-Informed Adversarial Modeling: Infimal Subadditivity of Interpolative Divergences

arXiv.org Machine Learning

We study adversarial learning when the target distribution factorizes according to a known Bayesian network. For interpolative divergences, including $(f,Γ)$-divergences, we prove a new infimal subadditivity principle showing that, under suitable conditions, a global variational discrepancy is controlled by an average of family-level discrepancies aligned with the graph. In an additive regime, the surrogate is exact. This closes a theoretical gap in the literature; existing subadditivity results justify graph-informed adversarial learning for classical discrepancies, but not for interpolative divergences, where the usual factorization argument breaks down. In turn, we provide a justification for replacing a standard, graph-agnostic GAN with a monolithic discriminator by a graph-informed GAN (GiGAN) with localized family-level discriminators, without requiring the optimizer itself to factorize according to the graph. We also obtain parallel results for integral probability metrics and proximal optimal transport divergences, identify natural discriminator classes for which the theory applies, and present experiments showing improved stability and structural recovery relative to graph-agnostic baselines.


Entropy testing and its application to testing Bayesian networks

Neural Information Processing Systems

This paper studies the problem of \emph{entropy identity testing}: given sample access to a distribution $p$ and a fully described distribution $q$ (both are discrete distributions over the support of size $k$), and the promise that either $p = q$ or $ | H(p) - H(q) | \geqslant \varepsilon$, where $H(\cdot)$ denotes the Shannon entropy, a tester needs to distinguish between the two cases with high probability.


Learning Identifiable Gaussian Bayesian Networks in Polynomial Time and Sample Complexity

Neural Information Processing Systems

Learning the directed acyclic graph (DAG) structure of a Bayesian network from observational data is a notoriously difficult problem for which many non-identifiability and hardness results are known. In this paper we propose a provably polynomial-time algorithm for learning sparse Gaussian Bayesian networks with equal noise variance --- a class of Bayesian networks for which the DAG structure can be uniquely identified from observational data --- under high-dimensional settings. We show that $O(k^4 \log p)$ number of samples suffices for our method to recover the true DAG structure with high probability, where $p$ is the number of variables and $k$ is the maximum Markov blanket size. We obtain our theoretical guarantees under a condition called \emph{restricted strong adjacency faithfulness} (RSAF), which is strictly weaker than strong faithfulness --- a condition that other methods based on conditional independence testing need for their success. The sample complexity of our method matches the information-theoretic limits in terms of the dependence on $p$.


Learning and Testing Causal Models with Interventions

Neural Information Processing Systems

We consider testing and learning problems on causal Bayesian networks as defined by Pearl (Pearl, 2009). Given a causal Bayesian network M on a graph with n discrete variables and bounded in-degree and bounded ``confounded components'', we show that O(log n) interventions on an unknown causal Bayesian network X on the same graph, and O(n/epsilon^2) samples per intervention, suffice to efficiently distinguish whether X=M or whether there exists some intervention under which X and M are farther than epsilon in total variation distance. We also obtain sample/time/intervention efficient algorithms for: (i) testing the identity of two unknown causal Bayesian networks on the same graph; and (ii) learning a causal Bayesian network on a given graph. Although our algorithms are non-adaptive, we show that adaptivity does not help in general: Omega(log n) interventions are necessary for testing the identity of two unknown causal Bayesian networks on the same graph, even adaptively. Our algorithms are enabled by a new subadditivity inequality for the squared Hellinger distance between two causal Bayesian networks.


Learning Concave Conditional Likelihood Models for Improved Analysis of Tandem Mass Spectra

Neural Information Processing Systems

The most widely used technology to identify the proteins present in a complex biological sample is tandem mass spectrometry, which quickly produces a large collection of spectra representative of the peptides (i.e., protein subsequences) present in the original sample. In this work, we greatly expand the parameter learning capabilities of a dynamic Bayesian network (DBN) peptide-scoring algorithm, Didea, by deriving emission distributions for which its conditional log-likelihood scoring function remains concave. We show that this class of emission distributions, called Convex Virtual Emissions (CVEs), naturally generalizes the log-sum-exp function while rendering both maximum likelihood estimation and conditional maximum likelihood estimation concave for a wide range of Bayesian networks. Utilizing CVEs in Didea allows efficient learning of a large number of parameters while ensuring global convergence, in stark contrast to Didea's previous parameter learning framework (which could only learn a single parameter using a costly grid search) and other trainable models (which only ensure convergence to local optima). The newly trained scoring function substantially outperforms the state-of-the-art in both scoring function accuracy and downstream Fisher kernel analysis. Furthermore, we significantly improve Didea's runtime performance through successive optimizations to its message passing schedule and derive explicit connections between Didea's new concave score and related MS/MS scoring functions.


Entropy testing and its application to testing Bayesian networks

Neural Information Processing Systems

This paper studies the problem of entropy identity testing: given sample access to a distribution p and a fully described distribution q (both discrete distributions over a domain of size k), and the promise that either p = q or |H (p) H (q)| ε, where H () denotes the Shannon entropy, a tester needs to distinguish between the two cases with high probability.